Parallel Graph-Based Semi-Supervised Learning
نویسنده
چکیده
Semi-supervised learning (SSL) is the process of training decision functions using small amounts of labeled and relatively large amounts of unlabeled data. In many applications, annotating training data is time-consuming and error prone. Speech recognition is the typical example, which requires large amounts of meticulously annotated speech data (Evermann et al., 2005) to produce an accurate system. In the case of document classification for Internet search, it is not even feasible to accurately annotate a relatively large number of web pages for all categories of potential interest. SSL lends itself as a useful technique in many machine learning applications as one needs only to annotate relatively small amounts of the available data. SSL is related to the problem of transductive learning (Vapnik, 1998). In general, a learner is transductive if it is designed for prediction on only a closed data set, where the test set is revealed at training time. In practice, however, transductive learners can be modified to handle unseen data (Sindhwani et al., 2005; Zhu, 2005a). Chapter 25 in (Chapelle et al., 2007) gives a full discussion on the relationship between SSL and transductive learning. In this chapter, SSL refers to the semi-supervised transductive classification problem. Let x ∈ X denote the input to the decision function (classifier), f , and y ∈ Y denote its output label, i.e., f : X→ Y. In most cases f(x) = argmaxy∈Y p(y|x). In SSL, certain reasonable assumptions are made so that properties of the distribution p(x) (which is available from the unlabeled data sampled from p(x)) can influence p(y|x). These assumptions are as follows:
منابع مشابه
Efficient Distributed Semi-Supervised Learning using Stochastic Regularization over Affinity Graphs
We describe a computationally efficient, stochastic graph-regularization technique that can be utilized for the semi-supervised training of deep neural networks in a parallel or distributed setting. We utilize a technique, first described in [13] for the construction of mini-batches for stochastic gradient descent (SGD) based on synthesized partitions of an affinity graph that are consistent wi...
متن کاملLarge-Scale Graph-based Transductive Inference
We consider the issue of scalability of graph-based semi-supervised learning (SSL) algorithms. In this context, we propose a fast graph node ordering algorithm that improves (parallel) spatial locality by being cache cognizant. This approach allows for a near linear speedup on a shared-memory parallel machine to be achievable, and thus means that graph-based SSL can scale to very large data set...
متن کاملParallel and Distributed Approaches for Graph Based Semi-supervised Learning
Two approaches for graph based semi-supervised learning are proposed. The first approach is based on iteration of an affine map. A key element of the affine map iteration is sparse matrix-vector multiplication, which has several very efficient parallel implementations. The second approach belongs to the class of Markov Chain Monte Carlo (MCMC) algorithms. It is based on sampling of nodes by per...
متن کاملCross-Domain Kernel Induction for Transfer Learning
The key question in transfer learning (TL) research is how to make model induction transferable across different domains. Common methods so far require source and target domains to have a shared/homogeneous feature space, or the projection of features from heterogeneous domains onto a shared space. This paper proposes a novel framework, which does not require a shared feature space but instead ...
متن کاملQuery-focused Multi-Document Summarization: Combining a Topic Model with Graph-based Semi-supervised Learning
Graph-based learning algorithms have been shown to be an effective approach for query-focused multi-document summarization (MDS). In this paper, we extend the standard graph ranking algorithm by proposing a two-layer (i.e. sentence layer and topic layer) graph-based semi-supervised learning approach based on topic modeling techniques. Experimental results on TAC datasets show that by considerin...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012